Quantal phonetics and distinctive features
نویسندگان
چکیده
This paper reviews some of the basic premises of Quantal-Enhancement Theory as developed by K.N. Stevens and his colleagues. Quantal theory seeks to explain why some articulatory and acoustic dimensions are favored over others in distinctive feature contrasts across languages. In this paper, after a review of basic concepts, a protocol for quantal feature definitions is proposed and problems in the interpretation of vowel features are discussed. The quantal basis of distinctive feature Though most linguists and phoneticians agree that the distinctive features of spoken languages are realized in terms of concrete physical and auditory properties, there is little agreement on exactly how they are defined. According to a tradition launched by Jakobson and his collaborators (for example, Jakobson, Fant and Halle 1952), features are defined mainly in the acoustic (or perhaps auditory) domain. In a second tradition initiated by Chomsky and Halle (1968), features are defined primarily in articulatory terms. After several decades of research, these conflicting approaches have not yet led to any widely-accepted synthesis. In recent years, a new initiative has emerged within the framework of the Quantal Theory of speech, developed by K.N. Stevens and his colleagues (e.g. Stevens 1989, 2002, 2005). This theory maintains that the universal set of features is not arbitrary, but can be deduced from the interactions between the articulatory parameters of speech and their acoustic effects. The central claim is that there are phonetic regions in which the relationship between an articulatory configuration and its corresponding acoustic output is not linear. Within such regions, small changes along the articulatory dimension have little effect on the acoustic output. It is such regions of acoustic stability that define the articulatory inventories used in natural languages. In other words, these regions form the basis for a universal set of distinctive features, each of which corresponds to an articulatory-acoustic coupling within which the auditory system is insensitive to small articulatory movements. A simple example of an acoustic-articulatory coupling can be found in the parameter of vocal tract constriction. Degrees of constriction can be ordered along an articulatory continuum extending from a large opening (as in G. N. Clemens and R. Ridouane 18 low vowels) to complete closure (as in simple oral stops). In most voiced non-nasal sounds, the passage along a scale of successively greater degrees of constriction gives rise to three relatively stable acoustic regions with separate and well-defined properties. Sounds generated with an unobstructed vocal tract constriction, such as vowels, semivowels, and liquids, are classified as sonorants. A sudden change in the acoustic output occurs when the constriction degree passes the critical threshold for noise production (the Reynolds number, see Catford 1977), giving rise to continuant obstruent sounds (fricatives). A further discontinuity occurs when the vocal tract reaches complete closure, corresponding to the configuration for noncontinuant obstruents (oral stops). These relations are shown for voiced sounds in Figure 1, where the three stable regions correspond to the three plateaux. Figure 1. Continuous changes along the articulatory parameter “constriction degree” define three stable acoustic regions in voiced sounds. In voiceless sounds, the falling slope in this figure shifts some distance to the right (to around 90 mm), and the region between the shifted and unshifted slopes (about 20 to 90 mm), corresponding to voiceless noise production, defines the class of approximant sounds (liquids, high semivowels, etc.), whose acoustic realization is noiseless when they are voiced but noisy when they are voiceless (Catford 1977). Languages prefer to exploit articulations that correspond to each of the four stable regions defined in this way. These regions give rise to the features which define the major classes of speech sounds, as shown in Table 1. (The feature [+vocalic], used here to define vowels and semivowels, is equivalent to the classical feature [-consonantal]). Quantal phonetics and distinctive features 19 Table 1. The four major classes of speech sounds. Vowels stops fricatives approximants vocoids [continuant] no yes yes yes [sonorant] no no yes yes [vocalic] no no yes/no yes These features are commonly used across languages. All known languages have stops and vowels, and most have fricatives and approximants as well. A protocol for quantal feature definitions A feature definition, if it is quantal, must identify an articulatory continuum associated with one or more acoustic discontinuities, and must specify the range within this continuum that corresponds to relatively stable regions in the related acoustic output. The range is the articulatory definition of the feature, and the associated output is the acoustic definition. A feature definition must also identify the stable region in terms specific enough to distinguish it from other regions, yet general enough to apply to all articulations within this region, allowing for observed crosslinguistic variation. It must effectively distinguish segments bearing this feature (e.g. /t/) from otherwise similar segments that do not (e.g. /t/). Finally, it must identify the classes of sounds in which the definition holds. This will usually be the class in which the feature is at least potentially distinctive. As an example, consider a proposed definition of the feature [+consonantal], which distinguishes true consonants from vocoids (vowels, semivowels) and laryngeals: "The defining acoustic attribute for this feature is an abrupt discontinuity in the acoustic signal, usually across a range of frequencies. The defining articulatory attribute is the formation of a constriction in the oral cavity that is sufficiently narrow to create such an acoustic discontinuity. This description applies to both [-sonorant] and [+sonorant] consonants." (Stevens 2004, B79). This definition conforms to the protocol suggested above. It identifies an articulatory continuum (constriction degree) and identifies the range within this continuum ("narrow constriction") associated with a discontinuity -specifically, a rapid drop in F1 frequency and amplitude, as further explained and illustrated in the extended discussion of this feature in Stevens (1998), 244-6. It will be noted that this definition is specific enough to distinguish [+consonantal] sounds from other sounds, yet general enough to apply to a variety of realizations, for example by the lips, tongue blade, or tongue body. Finally, the definition is general enough to hold across all consonants, including both obstruents and sonorants. G. N. Clemens and R. Ridouane 20 There are two general families of quantal feature definitions: a) contextual definitions, in which the acoustic or auditory cue to the feature can only be detected when the sound bearing the feature occurs in an appropriate context, and b) intrinsic definitions, in which the cue can be found within the segment itself. The feature [+consonantal] just discussed is an example of a contextual definition, as the discontinuity in question occurs when the consonantal sound occurs in the context of a nonconsonantal sound (as in may or aim). A strong advantage of contextual cues is that they are linked to "landmarks" in the signal often associated with phoneme boundaries. Such "landmarks" are perceptually salient and tend to be rich in feature cues. It is suggested that they may facilitate speech segmentation and lexical access (e.g. Huffman 1990, Stevens 2000, 2002). An example of an intrinsic definition is the following, as proposed for the feature [±back] which distinguishes front vowels from central and back vowels. "[During the] forward displacement of the tongue body, the second natural frequency F2 of the vocal tract passes through the second natural frequency of the airway below the glottis, which we will call F2T, for the second tracheal resonance. For adult speakers, F2T has been observed to be in the range 1400 to 1600 Hz, and it is relatively constant for a given speaker. As F2 passes through F2T, the spectrum prominence corresponding to F2 often does not move smoothly, but exhibits a discontinuity or abrupt jump in frequency. Thus there tends to be a range of values of F2 within 100 Hz or so where the frequency of the spectrum prominence is unstable. It appears that languages avoid vowels with F2 in or close to this region... and put the F2 their vowels on one side or the other of this region; corresponding to [+back] vowels for lower F2 and [-back] vowels for higher F2. Thus there appears to be a dividing line between two regions with a low F2 for a backed tongue body position and a high F2 for a fronted tongue body position." (Stevens 2004, B79-80) This definition again follows the protocol. The articulatory continuum is tongue fronting (assuming a central position at rest), and the two stable regions correspond to positions in which the associated F2 is either above or below F2T. The definition is specific enough to distinguish this feature from others, but general enough to apply to various types of front, central and back vowels as well as to the same vowel in different contexts. Finally, it identifies the class of sounds in which the definition holds (vowels). This definition is an intrinsic definition, since to apply it we need only examine the internal properties of the vowel. An advantage of using an intrinsic definition in this case is that it accounts for the fact that vowels can usually be identified as front or back in isolation. Another is that vowels typically occur next to consonants, in which F2 is less prominent or absent. (Landmark effects can be found in front-to-back vowel transitions, as in the transition Quantal phonetics and distinctive features 21 from [a] to [i] (Honda & Takano 2006), but vowels in hiatus are too infrequent in most languages to provide a primary basis for feature definition.). Quantal acoustic-auditory relations Further types of discontinuity can be found among certain acoustic-auditory relations (Stevens 1989). We consider an example involving vowels. Vowels are often considered problematic for quantal analysis and it has been suggested that they may organize themselves instead according to an inherently gradient principle of maximal dispersion in perceptual space (e.g. Lindblom 1986). However, the fact that vowels pattern in terms of natural classes just as consonants do suggests that they are also organized in terms of features (see much phonological literature, as well as Schwartz et al. 1997: 281), raising the question of what these features are, and whether they are also quantal. A proposed quantal definition for the feature [±back] has been cited above, based on a region of F2 instability located in the mid-frequency range. Here we will examine evidence for the same feature from natural acoustic/auditory discontinuities. Vowel-matching experiments have shown that vowel formant patterns are perceived not just on the basis of individual formant frequencies, but also according to the distance between formants. In such experiments, synthetic vowels with several formants are matched against synthetic oneor twoformant vowels. Subjects are asked to adjust the frequency of the only (or the higher) formant of the latter vowel so that it matches the former as closely as possible in quality. Results show that when two formants in the normal range for F1 and F2 are well separated, they tend to be heard as two separate spectral peaks, but when two formants approach each other across a certain threshold value, their mutual amplitude is strongly enhanced and they are perceptually integrated into a single peak whose value is intermediate between the two acoustic formants. The crucial threshold for this integration is usually estimated at a value around 3.5 bark (Chistovich & Lublinskaja 1979). The implication of these experiments is that some aspect of the response of the auditory system undergoes a qualitative change -a discontinuity -when the distance between two spectral prominences falls under a critical value. Experiments with data involving Swedish vowels have confirmed this effect for higher formants as well (Carlson et al. 1970). In these experiments, synthetic vowels with five formants were matched against two-formant synthetic vowels. The first-formant frequency was the same for both vowels. Subjects were asked to adjust the second frequency F2' of the two-formant vowel to give the best match in quality to the corresponding five-formant vowel. G. N. Clemens and R. Ridouane 22 The results of the experiment are shown in Figure 2. Here, the frequencies of the first four formants in Hz are shown as lines and the F2' frequencies of the matching vowel are shown as rectangles. It is observed that when the spacing between F2 and F3 is less than about 3.0 bark, as it was for the front vowels (the first six in the figure), subjects place F2' at a frequency between F2 and F3 for all vowels except /i/. (In /i/, in which F3 is closer to F4 than to F2, they place F2' between F3 and F4.) In back vowels, in which higher formants have very low amplitude, F2' is placed directly on F2. Figure 2. Results of a matching experiment in which subjects adjusted the frequency F2' of a two-formant vowel to give the best match in quality to each of nine Swedish five-formant vowels; only the four lowest formants are shown here. (After Carlson et al. 1970.) These results indicate that there is a critical spacing of higher formants (F2, F3 and F4) leading to the interpretation of closely-grouped two-peak spectral prominences as single broad perceptual prominences. They give independent support for the view that the feature [±back] has a natural basis, in this case in terms of audition. We see that for [-back], but not [+back] vowels, the distance in Hz between F1 and the effective F2' is always greater than the distance between F1 and the acoustic F2. In other words, perception magnifies the front/back vowel distinction present in the acoustic structure. While the difference between [-back] and [+back] vowels seems wellfounded in quantal terms, it is much less clear that other features, such as those of vowel height and lip rounding, can be defined in these terms. For Quantal phonetics and distinctive features 23 example, there is no obvious discontinuity in the comparison of Swedish [+high] /u/ and [-high] /o/ in Figure 2. For reasons such as these, phoneticians usually tend to speak of quantal vowels rather than of quantal features. Quantal vowels are those in which two formants approach each other maximally, an effect known as focalisation (Schwartz et al. 1997). It is sometimes thought that /i/, /u/, /a/ and perhaps /y/ or /æ/ may constitute quantal vowels in this sense, though experimentally-based, multispeaker data bearing on this question is still rather scarce. We do not propose, however, to abandon the search for nongradient definitions for vowel features. We tentatively suggest that features of vowel height -setting aside the problematic feature [±ATR] -may be defined in terms of the absolute boundary values set by the upper and lower range of each speaker. In this point of view, a vowel bearing the feature [+high] would be one whose perceived lowest prominence let us call it P1 -falls within an auditorily indistinguishable subrange of values at the bottom of a given speaker's total range of values for this prominence, while a [+low] vowel would be one whose perceived lowest prominence falls within the corresponding subrange at the top. A mid vowel, bearing the values [-high, -low], would be defined as falling within neither of these subranges. In other words, the speaker's total range of values for a given prominence Pn establishes the frame of reference with respect to which a given production is evaluated. While this account is not strictly quantal (as there appears to be no natural discontinuity as we pass up and down the vowel height scale), it has the advantage of tying the feature definition to a set of fixed reference points, defined in a way that is applicable to any speaker, regardless of the size and shape of their vocal tract. If it is true that vowel identification is more reliable as a vowel's values approach the periphery of the vowel triangle (see Polka & Bohn 2003), we can explain why distinctions among mid vowels (such as /e/ vs. /ε/) are much less stable across languages, in both historical and synchronic terms, than distinctions involving high vs. mid or mid vs. low vowels. These suggestions are quite tentative, of course, and we believe that future research should continue to seek possible quantal correlates of vowel height.
منابع مشابه
Quantal theory, enhancement and overlap
We review a number of examples in which there appear to be “quantal” attributes in functions that relate positions or states of articulators and the acoustic and perceptual consequences of these actions. As a consequence of this review, we have attempted to specify more clearly what defines a quantal relation: the speech production system can assume a set of discrete states such that there are ...
متن کاملOn the perceptual basis of distinctive features: Evidence from the perception of fricatives by Dutch and English speakers
Two speech perception experiments explored the auditory basis of distinctive features. Experiment 1 found that Dutch listeners rated [s] and [P] as more similar to each other than American English listeners did. We attributed this to the lack of a phonemic distinction between [s] and [P] in Dutch phonology in addition to their relationship via a productive phonological rule in Dutch. Experiment...
متن کاملAcoustic and auditory phonetics: the adaptive design of speech sound systems.
Speech perception is remarkably robust. This paper examines how acoustic and auditory properties of vowels and consonants help to ensure intelligibility. First, the source-filter theory of speech production is briefly described, and the relationship between vocal-tract properties and formant patterns is demonstrated for some commonly occurring vowels. Next, two accounts of the structure of pref...
متن کاملMultiphoton ionization of hydrogen in parallel microwave and static fields: quantal and classical simulations
The ionization of hydrogen in strong microwave fields is a fundamental problem of atomic physics and nonlinear dynamics. When a strong static field is added parallel to a linearly polarized microwave field of comparable strength, transitions between dressed states take place simultaneously at several resonance frequencies. We present classical and quantal simulations of this intricate problem w...
متن کاملTemporal distribution of information for human consonant recognition in VCV utterances
The temporal distribution of perceptually relevant information for consonant recognition in British English VCVs is investigated. The information distribution in the vicinity of consonantal closure and release was measured by presenting initial and "nal portions, respectively, of naturally produced VCV utterances to listeners for categorization. A multidimensional scaling analysis of the result...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006